-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix error after remote tidy timeout #4465
Conversation
# Terminate any remaining commands | ||
for platform_n, (cmd, proc) in procs.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the offending line
4818895
to
b40a558
Compare
Nice, kudos if that works out 🥇 |
self.bad_hosts.add(host) | ||
while queue and time() < timeout: | ||
item = queue.popleft() | ||
if item.proc.poll() is None: # proc still running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use proc.wait(timeout=10)
to avoid this polling loop and the subsequent termination loop e.g:
for proc in procs:
try:
if proc.wait(10):
...
else:
...
except subprocess.TimeoutExpired:
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would negate the concurrency in the way the queue is currently handled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The processes would still be concurrent, however, the result retrieval would be sequential:
from subprocess import *
from time import time
procs = [
Popen(['sleep', '2'])
for _ in range(5)
]
start = time()
for proc in procs:
proc.wait()
print(f'{time() - start}')
$ python test.py
2.007575035095215
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, but considering that retries for SSH 255 failures are queued after result retrieval, it would still slow it down in those cases, I'd have thought
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, damn, adding to the list whilst processing it makes a mess of things.
Number of failed tests (before retry) now a lot less.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have checked out the branch, read the code. Only issue I have found is the purge has been commented out in the ihs test.
tests/functional/intelligent-host-selection/01-periodic-clear-badhosts.t
Outdated
Show resolved
Hide resolved
Avoid maxing out CPU!
3a7321b
to
19d7e59
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 thanks @MetRonnie
This is a small change with no associated Issue.
Fix a
ValueError
that could occur during remote tidy during shutdown:I think that might have been why
_remote_background_indep_poll
was so flaky on GH Actions.In the process I refactored
TaskRemoteMgr.remote_tidy()
to use a queue of processes instead of a dict, much like how remote clean works.Requirements check-list
CONTRIBUTING.md
and added my name as a Code Contributor.setup.py
andconda-environment.yml
.